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Abstract — Social learning encompasses situations in which 
agents attempt to learn from observing the actions of other 
agents. It is well known that in some cases this can lead to 
information cascades in which agents blindly follow the actions 
of others, even though this may not be optimal. Having agents 
provide reviews in addition to their actions provides one possible 
way to avoid “bad cascades.” In this paper, we study one 
such model where agents sequentially decide whether or not to 
purchase a good, whose true value if either good or bad. If 
they purchase the good, agents also leave a review, which may 
be noisy. Conditioning on the underlying state of the world, we 
study the impact of such reviews on the asymptotic properties 
of cascades. For a good underlying state, using Markov analysis 
we show that depending on the noise level, reviews may in fact 
increase the probability of a wrong cascade. On the other hand, 
for a bad underlying state, we use martingale analysis to bound 
the tail-probability of the time until a correct cascade happens. 

I. Introduction 

People often seek to learn from observing others when faced 
with new decisions. On-line platforms facilitate acquiring such 
information at a much greater scale than was previously 
possible. A basic question is then to understand how such 
information facilitates learning. One common approach for 
studying such questions is as a game among Bayesian agents. 
These agents sequentially make a binary decisions given 
their own private information as well as observations of the 
decisions of previous agents. A key result, first shown in [2] 
and [3], is that in such models herding or an information 
cascade can occur in which from some point onward all agents 
ignore their private information and follow the actions of the 
previous agents. Though individually optimal, this may result 
in the agents making a choice that is not socially optimal. 

One reason for incorrect herding is that agents observe the 
actions of other agents before the other agents receive their 
pay-off, and so these actions reflect the agents’ estimates of 
the true pay-off and not the true pay-off itself. Indeed, if agents 
instead were able to see the true pay-off obtained by others, 
then as shown in [9] there would never be an incorrect cascade 
in which agents buy a bad product. The use of reviews and 
on-line recommendation systems can be viewed as an attempt 
to provide other agents with this information. However, due 
for example to user errors, such reviews may only be a noisy 
representation of this information (instead of the true pay-off 
as in [9]). Studying social learning in the presence of such 
noisy reviews is the objective of this paper. More precisely, we 
consider a variation of the model in [2], [3], where agents have 
the option to either buy or not buy a given item, whose true 
value is one of two binary states (good or bad). In addition to 


the actions of the previous agents, agents also see a history of 
reviews before making their decisions. However, these reviews 
are not a perfect indication of the true state of the good due to 
two effects: first, as we have already mentioned, these reviews 
are noisy and second, agents can only leave a review if they 
buy the good and so no additional information is given for 
agents that choose not to buy. 1 

Adding reviews is a way of changing the information 
structure in [2], [3], There have been a variety of other 
papers that considered other changes in this structure such as 
changing the underlying network structure among the agents, 
e.g. [8], or changing the signal structure, e.g. [6]. In prior 
work ([10], [11]), we considered a variation of the information 
structure, where agent’s observed noisy observation of the 
actions of others. This led to the following counter-intuitive 
result: the probability of incorrect herding is non-monotonic 
in the noise level. In other words, in some cases, more noise is 
actually beneficial. In this paper, we again seek to study how 
variations in noise level effect the agents’ behavior. However, 
here, agents perfectly observe the actions of previous agents 
and the only noise is in the reviews. Additionally, since only 
agents who buy the good can submit reviews, this leads to an 
asymmetry in the model that was not present in [10], [11]). 

The asymmetry in reviewing leads to an asymmetry in the 
resulting user behavior depending on the underlying state of 
the product. There is still the possibility of an incorrect cascade 
when the product is “good”. This is because once agents stop 
buying, no further reviews are generated and so there is no new 
information to stop a cascade. However, when the product is 
“bad” an incorrect cascade can not persist; eventually enough 
negative reviews will be generated to stop it. We analyze these 
two cases separately. Conditioned on the state of the product 
being good, we study the probability of an incorrect cascade 
and show that if reviews are too noisy, they actually lead to a 
higher probability or this occurring. Conditioned on the state 
being bad, we instead focus on the time until a correct cascade 
occurs and give a tail bound on this probability that illustrates 
the impact of review quality. 

Another strand of related work is the literature on “word-of- 
mouth” learning (e.g. [4], [5], [7]) in which agents can com- 
municate information about payoff of past actions. However, 
these models consider different settings (e.g. naive rule-of- 
thumb decision-based, random sampling of population); while 

1 For example, many on-line platform such as Amazon.com indicate verified 
purchase reviews ; in our mode only such reviews are considered. 



our paper assume that fully-rational agents can observe all past 
actions and reviews. 

This remainder of the paper is organized as follows. In Sec- 
tion II we specify our model. The main results are presented 
in sections III and IV for the case where the value of product 
is “good” and “bad,” respectively. We conclude in Section V. 

II. Model 

We consider a model similar to [2] in which there is a 
countable population of agents, indexed n = 1,2,... with the 
index reflecting the time and order of actions of the agents. 
Each agent n has an action choice A n of saying either Yes (Y) 
or No ( N ) to a new item. The true value ( V ) of the item can 
be either 0 (bad) or 1 (good); both possibilities are assumed 
to be equally likely. To reflect the agents’ prior knowledge 
about the true value of the item, we assume that each agent 
n receives a private signal S n £ {H (High), L (Low)}. For 
each agent n who chooses A n = N, he does not submit a 
review. However, if A n = V then agent n submits a review 
R n £ {G (Good), B (Bad)} representing his experience with 
the item after purchasing. Assume the probability that a private 
signal (resp. a review) aligns with V is p £ (0.5, 1) (resp. 
5 £ [0.5, 1]). That is: 

P [S n = H\V = 0] = P [S n = L\V = l] = l-p, 

P[S„ = H\V = 1] = P [S n = L\V = 0] = p, and if A n = Y, 

P [Rn = G\V = 1 ] = P[R n = B\V = 0] = 6, 

P [R n = G\V = 0] = P [R n = B\V = 1] = 1 - 8. 

We consider a homogeneous population where conditioned 
on V, the private signals and reviews are i.i.d. across all agents. 
Since p £ (0.5, 1), the private signals are informative, but not 
revealing; we call p the signal quality. On the other hand, 
S denotes the review’s strength which is independent of the 
signal. 2 

We assume that each agent takes his one-time action in 
exogenous order where the actions and reviews history is 
public information to subsequent agents. The agents are Bayes- 
rational whose decisions are based own private signals and 
public information. Each agent n updates his posterior belief 
about the true value V using his private signal S n , the actions 
Ai,...,A n _i, and the reviews R m whenever A. m = Y for 
to = 1, . . . , n — l. 3 

A. Public likelihood ratio as a Markov process 

Let lZ n = R n when A n = Y and lZ n =* when A n = 
N. Then the public history after agent n decides is written 
as H n = {Ai,TZi, . . . , A n ,7Z n }. Agents’ decisions are based 
on calculations of the posterior probability of V = 0 versus 
V = 1 given the observed history H n . However, due to the 
independence of signals from history, agent n+ 1 can instead 
compare the public likelihood ratio, t n , and his private belief 

- The motivation being, while signal quality reflects product's marketing 
efficiency, the review strength is a consequence of product reliability from 
manufacturing aspect. 

3 For simplicity, we assume indifferent agents follow their signals. 


/3 ra _t_i, of V = 0 versus V = 1. Using Bayes’ rule and V being 
{0, 1} equally likely, we can rewrite £ n in its alternate form: 


P[g„|V = 0] 
p[H n \v = iy 


and p n+ i 


P[S n+1 |V = 0] 
P[S n+1 \V = 1] 


Since V is equally likely 1 or 0, /: 0 = 1 . The higher £ n is, the 
more likely H n is indicating V = 0. Moreover, since H „ is 
public information, for both true values t n can be updated as: 
• If agent n follows his own signal then: 


[ 1^4-1, if A n = N 
4= ^x4i, if A n = Y, lZ n = G (2) 

1^1=74-1, if A n =Y,U n =B 

• Otherwise, if agent n cascades then: 

f 4-i, if A n = N 

4= ^4-1, if A n = Y 1 U n = G (3) 

Il=j4-1, if A n = Y,n n = B 

Thus, given H n , { l„ } is a Markov process. Moreover, this 
is also true if in addition we condition on each value of V 5 . 
On the other hand, /3„+i = (1 — p)/p (resp. p/( 1 — p)) if 
S n +i = H (resp. S n+i = L). 


B. Agents’ decision rule and cascades’ condition 

By (2) and (3), 4 = (^T", where a n and r n 

are non-negative integer random variables denoting the two 
differences in actions (#Y — and reviews (#G — #B), 
respectively (excluding the actions caused by both types of 
cascades, and the reviews not available in particular for N 
cascade). For convenience, define x = log^s_ £ [0, oo] 
as the indicator of how strong the reviews are with respect to 
signals. In other words, the lower x is, the stronger the reviews 

are relatively to the signals. We can rewrite l n = 
where the exponent h„ = a n + -r n . Since agent n + 1 makes 
his decision by comparing i n to /3„+i, agent n + 1 cascades 
Y if h n > 1, cascades N if h n < — 1, and follows his signal 
if h n £ [-1,1] . 

C. Asymmetry under different types of cascade and product’s 
quality 

This model exhibits asymmetric behaviors with respect both 
to the types of cascades Y and N, and to the true value V of 
the item. In particular, the arrival of new information (reviews) 
depends on the action chosen by each agent. 

1) Y versus N cascades: If agent n faces /i„_i > 1, he 
chooses A n = Y regardless of his signal and thus initiates a Y 
cascade. A Y cascade does not last forever, unless the reviews 
are of perfect quality (S = 1). For example, if lZ n = B, then 
h n = h n - 1 — - could be below 1 , which induces agent n + 1 
to use his own signal. Further more, if x is sufficiently small 
then h n < — 1, and agent n + 1 initiates a N cascade. The 
dynamics of a Y cascade, once it gets started, are determined 


5 This is an extension of results from [6], 



solely by the reviews process (and it does not depend on the 
signals). Regardless of the time a Y cascade was initiated, it 
can be broken by a sufficiently long sequence of bad reviews. 
Thus, the history process {H n } could include sample paths 
where Y cascades starts and stops infinitely often. 

On the other hand, once h n < — 1, a N cascade starts 
and lasts forever. This is due to agents who choose N not 
generating reviews; thus the likelihood ratio stays constant as 
soon as any agent cascades to N. Subsequent agents who have 
the same signal strength are left in the same state as the one 
who initiated the cascade; thus have the same action choice. 

2) Good versus bad product: For V = 1, wrong cascade 
happens with positive probability. For example, if the first two 
agents have L signals, they both choose N ; therefore no review 
is collected. As a result, all subsequent agents are drawn into 
a N cascade, which is irreversible. This possibility cannot be 
avoided by adjusting the reviews strength, 5, even to perfect 
quality. In case the reviews are perfect, we would still need a 
non-cascading agent who has a H signal for his review to take 
effect. In addition, for V = 1, it is highly likely that there is 
an abundance of new information. If an agent n choose Y, one 
review 7 Z n is added to the common database. Since reviews 
are independent of signal, when V = 1 more agents choose 
Y and new information begets further new information. 

In other words, when V = 1 the underlying Markov 
process have a drift toward the correct cascade, but there is 
no absorbing state on that side since h n is unbounded above. 
However, multiple absorbing states for wrong cascade might 
exist. For V = 1, the quantity of interest is the probability of 
wrong ( N ) cascade which is a function of both p and 6. One 
would expect the time until correct cascade to be infinite by 
considering the drift of the underlying Markov process. We 
will discuss this scenario in section III. 

On the other hand, when V = 0, this model exhibits a 
different set of behaviors. Wrong cascade can never happen. 
The reason is as more agents purchase the item, more and 
more reviews are collected. Since reviews are informative, 
subsequent agents can track the difference in the number of 
reviews to learn the true value of V eventually. In other words, 
while there are only trapping states for correct cascade, the 
drift also leans toward this side. Thus, correct cascade happens 
with probability 1. In this scenario, we are interested in the 
distribution of the time (i.e. the number of agents) until correct 
cascade happens. This will be studied in section VI. 

III. Probability of wrong cascade for V = 1 

In previous section, we discussed that wrong (N) cascade 
could happen if the product is good. In this section, we 
determine the probability of this happening. Let q = 1 — p. 
For a fixed p, as x varies the conditions on a n and r n when 
cascades happen also changes. We denote these two random 
variables as two coordinates of a 2-D Markov Chains (MCs). 
As a result, the underlying MCs have different structures 
(both on states space and transition probabilities). Despite 
the complexity of this dynamics for a generic value of x, 
interesting and non-intuitive insights can be drawn by looking 


at special values of x. Proposition 1 shows that adding reviews 
with strength equal or even double the signal quality strictly 
increases the probability of wrong cascades. 

Proposition 1. 1 ) Having reviews twice as strong as signals 
(i.e. x = 1/2) gives the same probability of wrong cascade as 
having review as strong as signal (i.e. x = 1), and 

2) Both cases give higher probability of wrong cascade as 
compared to when having no review. 


Proof. 1) The probabilities of wrong cascade can be calcu- 
lated using Markov chain analysis. The corresponding MCs 
are shown in Fig. 1 and 2, where the states denote h n . For 
x = 1 (see Fig. 1), let b t be the asymptotic wrong cascade 
probability starting from state i. We solve for ho using the 
following system of linear equations: 

b - i = pqb-i +p 2 b 1 +q; b 0 = qb- 1 + pqb 0 + p 2 b 2 ; 
bi = qbo+pqbi + p 2 b 3 ; b 2 = (q/p)h; b 3 = ( q/p) 2 bi 

which gives bo = ( q/p ) 2 . For x = 1/2 (see Fig. 2), let Cj be 
the asymptotic wrong cascade probability starting from state i. 
We solve for c (J using the following system of linear equations: 


c— i = pSc 2 + (1 -pS); c 0 = (l-p8)c-i +pSc 3 ; 

ci = (1 - p5)c 0 + pScr, c 2 = ((1 - 8)/8)cq = {8/(1 - <5))c 4 ; 

c 3 = ((1 - 8)/ 8) ci, where 8/(1 - 8) = (p/qf 


which gives c 0 = ( q/p ) 2 = b 0 

2) When there is no reviews, result from [2] gives 

P [wrong] = < (q/p) 2 - □ 



Figure 1: States transitions for V — 1, and x = 1. 



Proposition 1 suggests that one should look at regions where 
reviews are even stronger. Unfortunately, Proposition 2 shows 
that except for the reviews having perfect strength, one cannot 
guarantee a better performance with reviews for all values of 

p £ (0.5, 1). 

Proposition 2. Assume 0 < x < 1/3 (i.e. reviews are more 
than triple the signals’ quality): 



1) P [wrong] decreases in the review quality, 8, and 

2) At x = 0 (i.e. perfect reviews), P [wrong\ = q 2 , which 
is lower than that for having no review. 

3) For x bounded away from 0, there exists a threshold 
Po £ (0.5, 0.75) such that for signal quality with p < po, 
we are better off having no reviews. 

Proof. 1) For 0 < x < 1/3, the underlying MC have has 
the form in Fig. 3, where the first and second coordinates 
denote r n and a n , respectively. Let </ ?) be the asymptotic 
wrong cascade probability starting from state ( i,j ). We solve 
for do,o the following system of linear equations: 

do ,- i = pSdi,o + p(l - 5) + q; di,i = ((1 - 5)/ 5) d 0 , i 
do,i = qd 0 ,o + pSdi, 2 + p{ 1 - <5); di, 2 = ((1 - 5)/5) 2 
<4,0 = qdo , _i +p8di,i + p( 1 — 8); do , 2 = ((1 — <5)/<5) ; 

which gives 

do.o = [1 - p(25 - 2 pS + 2 p- p/5)} / [1 - 2pq(l - 5)] 
which is decreasing in 5. 



2) For perfect reviews, wrong cascade happens if and only 
if the first two agents have L signals, which happens with 
probability q 2 < [( q/p ) 2 ] / [( q/p ) 2 + l] . 

3) po is the solution to: 

do , 0 = [(q/p) 2 ] / [(q/p) 2 + l] (4) 

First, we will show the existence of p 0 in (0.5,0.75). In fact, 
(4) is equivalent to: 

f(p) = P 3 [6<5 + (2/5) - 6] + p 2 [-14<5 - (2/5) + 10] 

+ p [12(5 + (l/<5) — 7] + (2 — 4(5) = 0 (5) 

Note that f(p) is continuous in p. Easily check that for any 
5 £ [0.5, 1], /( 0.5) > 0 and /( 0.75) < 0. Thus, by the Mean 
Value Theorem there exists a root po £ (0.5, 0.75). 

Now we show that p 0 is the only root of f(p) in (0.5, 1). 
Since /( 0) < 0, f(p) has another root pi £ [0,0.5). Moreover, 
p = 1 is another root of f(p) (note that at p = 1, 5 = 1). In 
addition, since f(p) is a cubic polynomial in p with positive 
highest order coefficient, we conclude that for 0.5 < p < po. 


f(p) > 0 => LHS( 8) > RHS( 8); and for p 0 < p < 1, 
f (p) < 0 => LHS( 8) < RHS( 8) . □ 



Figure 4: Wrong herding probability, V = 1. 

Fig. 4 illustrates both Propositions 1 and 2. For all cases, 
the probability of wrong cascade decreases in signal quality p. 
Moreover, except for reviews with perfect accuracy, one would 
prefer having no reviews for low signal quality as p — > 0.5. 

IV. Time until correct cascade for V = 0 

In section II, we argue that for a bad product, only correct 
(N) cascade can happen, from which it lasts forever. In this 
section, we examine the distribution of the time until correct 
cascade by determining its tail exponent. In the following let 
n > 0. Conditioned on V = i, let {=^} be the sequence of 
sigma-algebras generated by {H n }. Similar to models in [6] 
and [8] where reviews do not exist, in our model the Markov 
process { £ r , } also exhibits the martingale property as presented 
in the following lemma: 

Lemma 1. {l/f n } (resp. {£ n }) is a martingale process 
conditioned on V = Of resp. V = 1) adapted to the filtration 
{^°} (resp. {■?//}) 

Proof. Given and p, 5 as common knowledge, 

subsequent agents know agent n + 1 decision rule. If agent 
n + 1 follows an N cascade, i n +i = in thus the martingale 
property follows naturally. Otherwise, for V = 0, if agent n+1 
follows his own signal then: 

E[l/£ n+1 \^} = P[A n+1 = N\V = 0] [(1 -p)/p] /in 
+ P[A „ +1 = Y, R n+1 = G\V = 0] [p/( 1 - p)\ [, 5/(1 - 5)] /£ n 
+ P[A n+1 = Y, R n+1 = B\V = 0] [p/( 1 - p)} [(1 - <5)/<5] /t n 

= 1/L • ( 6 ) 

Similarly if agent n+1 cascades to Y when V = 0 then: 

E[l/i n+1 1 JF°] = <5/4 + (1 - <5)/4 = 1/4- (7) 

From (6), (7), it follows that { j- } is a martingale for V = 0. 
For V = 1, similar method shows that {4} is a martingale. 

□ 

Using lemma 1 and techniques from [1], we would like to 
use the martingale property to bound the tail probability of 
the time until correct cascade. This can also be used to give 


a bound on the expected time until correct cascade. Let X 
and Y be two random variables representing the increments 
A h n = h n + 1 — h n for h n in [—1,1] and h n > 1, respec- 
tively. Let /i(A) and / 2 (A) be their corresponding moment 
generating functions (MGFs), where A is a real variable. 
Let p = max(/i(A), / 2 (A)) and define the random process 
{M n } = {^}. We have: 

Lemma 2. {M n } is a super-martingale adapted to 

Proof. 

E[M n+ _ E[e Xh ^/p n+1 \^°] _ E[e AAft "] 

M n e Xhn / p n p 

mate (E [e AA ] , E [e AA ] ) 


Let t = min{n > 0 : h n < —1} be the stopping time when 
N cascade happens. Now we use Lemma 2 to give an upper- 
bound on the tail-probability of r in the following proposition. 

Proposition 3. P [r > n] < e x p n , where 0 < p < 1. 

Proof. For feasibility, we require 0 < P < L which implies 
A € (0, ln( jiij,))- Since r is a stopping time, so is n Ar. Thus 
{M rlAT } is also a super-martingale. Therefore by h 0 = 0 we 
have: 

l = e Afto =M 0 >E[M„ AT |jr 0 °] 

> E[M„ At ;t > n|^g]P[r > n] 

= E [e xh ^/p nAT -T > n|^ 0 °]P[r > n] 

> E [e Xh ”/p n \T > n|^ 0 °]P[r > n] 

> e A ^ 1 ^ p“"P[r > n\, since h n > —1 when r > n 
=>■ P[r > n] < e x p n 

□ 

The above bound is a function of n, the agent index, the 
dummy variable A, and the two MGFs /i,/ 2 . Our objective 
is to choose A and p which minimize this bound. We solve 
this numerically and compare the minimum bound with the 
tail-probability obtained using Monte-Carlo simulation for 
different values of p and <5. 



Fig. 5 shows that both simulation results and numerical 
bounds are decreasing as S increases. Moreover, the higher 
value S is, the faster the rate at which simulation and numerical 
results converge. 

V. Conclusions and future work 

This paper studied a simple observational Bayesian learning 
model with information cascade. We assumed that subsequent 
agents can observe perfectly the previous actions and, in 
addition, feedback in the form of reviews depending on 
the actions. We showed that the reviews could increase the 
probability that agents misinterpret the true value of a good 
product. In practice, in online platforms like Yelp, Amazon, 
etc. customers reviews come with a variability of strengths. 
Even though this scenario was not considered in this paper, 
our results indirectly implied that a platform planner should 
opt to cut out the reviews of bad qualities and release only the 
truthful ones. In fact, this strategy is already adopted by those 
platforms, e.g. Amazon with verified purchase reviews, or Yelp 
with filtered reviews. Moreover, our results suggested that no 
matter how strong the reviews are improved to, agents might 
not perform better if their prior knowledge are limited. This 
implied that a platform planner should consider spending their 
budget on improving both the product’s marketing efficiency 
and the reviews’ reliability. 

In the future work, we plan to study the possibility of having 
reviews with strengths non-homogeneously distributed across 
the population. Another possible direction is by considering 
having reviews when both type of actions are taken, where 
agents have the option to leave the reviews and assuming that 
not all agents would exercise this option. 
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Figure 5: Tail-probability of time until N cascade, p = 0.70. 


